AN EVALUATION OF FILTERING TECHNIQUES IN A NAÏVE BAYESIAN ANTI-SPAM FILTER by
نویسندگان
چکیده
An efficient anti-spam filter that would block all unsolicited messages i.e. spam, without blocking any legitimate messages is a growing need. To address this problem, this report takes a statistically-based approach, employing a Bayesian anti-spam filter, because it is content-based and self-learning (adaptive) in nature. We train the filter, using a large corpus of legitimate messages and spam, and we test the filter using new incoming personal messages. We evaluate four effective filtering techniques available for a Bayesian filter for our purposes. We look at the effectiveness of the technique, and we evaluate its different configurations for different threshold values in order to find an optimal anti-spam filter configuration. Based on cost-sensitive measures, we conclude that additional safety precautions are needed for a Bayesian anti-spam filter to be put into practice.
منابع مشابه
Variable Thresholding In Naïve Bayesian Spam Filters
Email has become an essential means of communication for both business and personal use. However, the proliferation of unwanted email advertising or spam has cost organizations millions of dollars and has reduced the effectiveness of email as a communications medium. Recently, spam filters have been widely adopted as a means of combating these unwanted messages. This paper presents a method for...
متن کاملAn evaluation of Naive Bayesian anti-spam filtering
It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail (“spam”). We conduct a thorough evaluation of this proposal on a corpus that we make publicly available, contributing towards standard benchmarks. At the same time we investigate the effect of attribute-set size, training-corpus size, lemmatization, and stop-lists on the filter’s performan...
متن کاملLearning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach
We investigate the performance of two machine learning algorithms in the context of antispam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an ...
متن کاملBUPT at TREC 2006: Spam Track
This report summarizes our participation in the TREC 2006 spam track, in which we consider the use of Bayesian models for the spam filtering task. Firstly, our anti-spam filter, Kidult, is briefly introduced. And then we try to use weighted adjustment of separating hyperplane and selective classifiers ensemble to improve the filtering performance. Finally, we summarize the relevant results from...
متن کاملIntroduction of Fingerprint Vector based Bayesian Method for Spam Filtering
With the development of the diversification of spam, it raises the difficulties and challenges to content-based spam filtering. To address this problem, this paper firstly introduced the statistical features of Email headers, and then proposed a method to use these features to improve Bayesian anti-spam filter. The selected Email-header features are presented as the fingerprint vectors, and the...
متن کامل